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Abstract —This paper introduces a novei aigorithm for cardi- 
naiity, i.e., the number of nodes, estimation in iarge scaie anony¬ 
mous graphs using statisticai inference methods. Appiications of 
this work inciude estimating the number of sensor devices, oniine 
sociai users, active protein ceiis, etc. In anonymous graphs, each 
node possesses iittie or non-existing information on the network 
topoiogy. In particuiar, this paper assumes that each node oniy 
knows its unique identifier. The aim is to estimate the cardinaiity 
of the graph and the neighbours of each node by querying a small 
portion of them. While the former allows the design of more 
efficient coding schemes for the network, the second provides a 
reliable way for routing packets. As a reference for comparison, 
this work considers the Best Linear Unbiased Estimators (BLUE). 
Eor dense graphs and specific running times, the proposed 
algorithm produces a cardinality estimate proportional to the 
BLUE. Eurthermore, for an arbitrary number of iterations, the 
estimate converges to the BLUE as the number of queried nodes 
tends to the total number of nodes in the network. Simulation 
results confirm the theoretical results by revealing that, for a 
moderate running time, asking a small group of nodes is sufficient 
to perform an estimation of 95% of the whole network. 

Index Terms —Anonymous networks, sensor networks, cardi¬ 
nality estimation, node counting. 

I. Introduction 

Wireless Sensor Networks (WSNs) have been a great suc¬ 
cess in the past decades. Generally, a WSN refers to a set 
of small electronic devices (sensors) capable of monitoring 
and measuring certain phenomena, e.g., temperature, pressure, 
flood, fires, etc., usually in hazardous and non-reachable 
environments. A WSN is typically composed of hundreds to 
millions of nodes capable of intersecting and communicating 
with each other. Due to their small size, these devices have 
limited resources such as memory, computation power, battery 
lifetime and bandwidth. 

This paper is interested in estimating the cardinality, i.e., 
the size, of a network. In other words, the aim is to determine 
the number of nodes distributed randomly and uniformly in a 
given field. There are several benefits for cardinality estimation 
in graphs such as energy efficiency IT], mobile communication 
and coding schemes design El, and distributed storage 0, HI. 
Furthermore, the paper proposes that each node discovers its 
neighbours which help network designers enhancing coverage 
and connectivity 0. 

Applications of the network size estimation are not limited 
to WSN. With the shift in the design from the centralized 


architectures to decentralized ones, the problem becomes in¬ 
creasingly in demand due to its applications in social networks 
and artificial intelligence 0-111. Even though decentralized 
systems are more scalable and robust to failure, their use 
makes the estimation of the parameters of the whole network 
challenging. 

For large-scale networks, where network size can reach up 
to couple of million nodes, it is computationally expensive to 
brute-search all nodes to infer information about the entire net¬ 
work. Moreover, it is infeasible that each node communicates 
with the data collector (DC). In order to determine the size of 
such massive systems, two trends can be distinguished in the 
literature, namely the node counting and the size estimation. 

The problem of node counting in an undirected graph is 
introduced in 0. Generally, the aim is to visit all nodes of 
a graph while avoiding at maximum revisiting nodes. The 
problem has numerous applications in artificial intelligence 
and control theory. However, the authors in 0 prove it to be 
NP-complete with a time complexity given by where 

n is the cardinality of the graph. Therefore, such approach it 
unsuited for large-scale networks. 

The use of statistical inference for cardinality estimation of 
a given network first appears in the literature with the German 
tank problem ifTOll . in which the aim is to estimate the total 
number of tanks given the serial number of the captured ones. 
The fundamental idea of cardinality estimation is to sample a 
subset of the entire population and available information. In 
other words, querying only a small portion of nodes to infer 
information about the whole network status. 

A. Related Work 

Due to the complexity of nodes’ counting, numerous re¬ 
search works focus on estimating the cardinality and mean 
edges degree in a graph. The authors in 0, ifTTll state a method 
for determining the total number of nodes in a graph using 
data flooding and random walks. They present an algorithm 
for node estimation in large-scale wireless sensor networks 
using random walks that travel through the network based on 
a predefined probability distribution. 

Ribeiro et al. ifT^ . lfT3]l propose a scheme for estimating the 
network parameters in directed graphs using random walks. 
Their algorithm precisely predicts the out-degree distributions 
of a variety of real-world graphs. Similarly, Dalai et al. 


M study the problem in a context of robust visual object 
recognition. The authors in 10 present two algorithms for 
estimating the total number of online users in social networks 
by using sampling from graphs assumed to have a stationary 
distribution. 

The authors in CSl, M propose a model for estimating 
the network parameters using random walks in graphs. In 
particular, by sampling from the graph, they propose a method 
to determine the average edge degree rather than the individual 
node degree. The problem of estimating the mean degree of 
a graph is first suggested by Feige et al. El. The authors in 
El present a sampling method for node degree estimations 
in a sampled network and the authors in ITJ show a way to 
obtain content properties by testing a small set of vertices in 
the graph. 

While the authors in im propose a model for distributed 
cardinality estimation in anonymous networks using statistical 
inference methods, the authors in lfT9l present a scheme for 
estimating the number of reachable neighbours for a given 
node and a size of the transitive closure. They present an 
0{n) time complexity algorithm based on Monte Carlo that 
estimates, with a small error, the sizes of all reachability sets 
and the transitive closure of a graph. 

B. Contribution 

The difficulty of the network size estimation heavily de¬ 
pends on the assumptions and features of the system. This pa¬ 
per considers the anonymous networks framework fSl, where 
nodes only know their unique identifier (ID). The authors in 
II 20 I show that with a centralized strategy, the node estimation 
can be obtained in finite time with probability one. For 
non-unique IDs, the authors in 11211 . Il22ll demonstrate that 
the estimation cannot be performed with probability one in 
limited time or with a bounded computational complexity. The 
problem is, then, to discover estimators that trade-off small 
error likelihood and moderate computational complexity. 

This paper proposes a hybrid scheme that not only performs 
node counting that can be run for an arbitrary time rounds but 
further uses the output to carry out the network size estimation. 
Such estimate benefits network designers to design the coding 
schemes appropriately. The proposed system combines the 
advantages of both the node counting algorithms and the 
node estimation ones. Given that the time and computation 
complexity of node counting algorithms is high, their use in 
large-scale networks is prohibitive. On the other hand, network 
size estimation algorithms have, in general, high variance. 
Depending on the initialization parameters, the estimate of the 
proposed scheme balance these two effects and can be made 
arbitrary as close to the network cardinality as wanted. Further¬ 
more, the algorithm suggests, at the same time, to discover the 
neighbours of each node. Such knowledge is crucial for data 
routing that can be combined with the code design resulting in 
efficient resource utilization. Due to space limitation, this work 
considers nodes with unique IDs. However, this assumption 
can be removed in a future work by exploiting the inverse 
birthday paradox. 

The rest of this paper is organized as follows: In Section HU 
the system model and the problem formulation are presented. 


Section m illustrates the proposed cardinality estimation 
algorithm whose performance analysis are characterized in 
Section HV] Simulation results are shown and discussed in 
Section |Vl Finally, Section [VTl concludes the paper. 

II. Network Model and Problem Formulation 

A. Network Model 

Consider a wireless sensor network J\f with n sensor nodes 
that are randomly and uniformly distributed in a region A = 
[0, L] X [0,kF] for some L,W >0. The network M can be 
considered as an abstract graph Q — (Vjf) with a set of 
nodes V and a set of edges £, where n = |V|. The set V = 
{si, , s„} represents the sensors that measure information 
about a specific field, and £ represents the set of links between 
the sensors. 

Two arbitrary sensors Si and Sj for I < i ^ j < n are 
connected if they are in the transmission range each other. 
Assuming that the transmission range is circular, let R be 
its radiu^j. Therefore, Si and Sj are connected if and only 
if d{si, Sj) < R, where d{., .) is the distance operator. 

The paper assumes that neither the number of these n nodes, 
i.e., the network size, nor their connections, i.e., the network 
topology, are known. However, a bound on the network size 
Nmux > n is known by the data fusion center. This scenario 
can be seen as a network after a long running time or a disaster. 
Initially, the network is composed of A^max nodes each having 
a unique ID. After a long running time or a disaster, some of 
the nodes may disappear from the graph leaving a graph with 
n < A^max nodes with unique ID. Let IDi be the ID of sensor 

Si. 

B. Network Protocol 

In the considered network model, each node knows only its 
unique identifier. Communication between nodes is performed 
by broadcasting the information to transmit. Note that nodes 
needs not to transmit additional bits indicating its ID with 
the information packet. Moreover, no acknowledgement is 
expected from sensors that successfully receive a packet. 
Transmissions are subject to erasure at the sensors with a 
probability for sensor Si. In other words, for a sensor Si 

broadcasting data, sensors Sj € Si successfully receives the 
data with probability 1 — where Si the set of neighbours 
of a node Si defined as follows. 

Definition 1. Denote by Si the set of neighbours of a node 
Si,l S i < n. In other words, S'i = {sj £ V such that 

d(^Si, Sj) R\. 

This paper consider static nodes in the network. Therefore, 
due to the motion-less of nodes, their relative position in the 
network remains identical which results in an unchanged set 
of neighbours for all nodes. 

’The algorithm is independent of the considered transmission range. 
However, the performance analysis provided in the rest of the paper assumes 
circular transmission range with the same radius for all nodes 
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Algorithm 1 Initialization Phase. 

Require: Q = (V, £), with V = {si,..., s„} and V 

Initialize T(0) = 0. 
for all Sj e V do 

Initialize Psi = {IDi] 

Initialize fs^ = 

end for 


C. Problem Formulation 

Given the aforementioned network model and protocol, this 
paper’s objectives is to: 

1) Estimate the number of nodes n by asking K randomly 
nodes in the network. Let 1C be the set of nodes in the 
network that can be queried by the data collector. This 
set of nodes is randomly picked from the set of alive and 
dead nodes with \1C\ = K \V\ = n < iVmax- 

2) Discover locally for an arbitrary node Si its set of 
neighbours Si. 

Without a loss of generality, the DC is assumed to know 
the IDs of the nodes in the initial network comprising A^max 
sensors. The selection of the queried nodes is performed by 
sampling uniformly without replacement from this set of IDs. 
Such methods results in K nodes randomly picked from the 
set of alive and dead ones. Throughout the paper, the notation 
U{Q, 1) refers to the uniform distribution over (0,1). 

III. Proposed Node Estimation Algorithm 

This section introduces the hybrid node counting and esti¬ 
mation algorithm. The algorithm estimates the total number 
of nodes in a network. The algorithm runs in three distinct 
phases: the initialization, the knowledge distribution, and the 
query phases. In the initialization phase, the initial packets of 
the nodes and their transmit probability are set. In the knowl¬ 
edge distribution phase, the information about the networks 
is disseminated among the surviving nodes from neighbour 
to neighbours. Einally, in the query stage, the DC collects 
the information about the network by asking some nodes and 
inferring the size of the whole system. 

A. Initialization Phase 

In the initial step, each node in the network generates a 
packet containing its ID. As the packet size limitation is 
crucial, this paper consider reducing it. Eor a network with 
initial iVmax nodes, the distinct IDs can be encoded using 
[log 2 (Winax)l, where [.] is the ceiling function. Therefore, the 
maximum size a packet can reach at any node in the network 
is n[log 2 (iVmax)l as only n nodes are alive. Such packet size 
is convenient for practical scenarios as it scales logarithmically 
with A^max and linearly with n. 

Each node Si also initializes its initial transmit probability 
/mitiai^ where /™'“' is the probability that the node broadcasts 
the packet it already holds to its neighbours. Whereas a small 
value of the initial probability means that there is small amount 
of communication between nodes in the network, a value 
jmitiai 2 means that all nodes broadcast their packets at each 


Algorithm 2 Knowledge dissemination Phase. 
Require: Q = (V,£), with V = {si,..., s„}. 
for f = 1 , 2 , • • • do 
Set T{t) = 0. 
for all Si e V do 
for all Sj G T{t — 1) do 
if Pg. heard then 
Set Si = Si VJ Sj . 
if Psj ^ Psi then 

Set Ps, = {Ps,UPs,) \ID,. 
Set Ps, = {Ps,,ID,}. 

Set fs, = + 1). 

end if 
end if 
end for 

Sample Ug. from h({0, 1). 
if Usi < fsi then 
Si broadcasts Ps^■ 

Set 

Set 7"(f) = T(f) U Si 

end if 
end for 
end for 


iteration. Let T{t) be the set of nodes that transmitted a packet 
at time instant t with T(0) = 0. Algorithm [T] summarizes the 
steps of the initialisation phase. 

Remark 1. The proposed algorithm can be easily extended 
to perform topology discovery, i.e., the estimation of both V 
and £, by modifying the initial packets of each node. Each 
node Si generates a packet containing both its ID and its 
{X,Y) coordinates. Assuming that coordinates are encoded 
using V bits, e.g., V = 32 bits to encode a real number, 
the maximum size a packet can reach is 2 En|"log 2 (Afinax)l- 
Therefore, the size of the topology discovery packet scales in 
the same manner as the one of the cardinality estimation. Due 
to space limitations, the performance analysis of the topology 
discovery scheme is omitted in this paper as it follows similar 
steps to the ones exposed herein. 

B. Knowledge Distribution Phase 

In this phase, the knowledge is distributed among the alive 
nodes in the network from neighbours to neighbours. At each 
running time of the algorithm, if a node Si receives a packet 
from a node Sj whose ID can be determined by examining 
the last ID in the received packet, it adds such node to is set 
of neighbours Si. Depending on the content of the received 
packet, two scenarios can be distinguished: 

• The packet does not contain a new information for Si, 
i.e., Ps C Psf)'. The packet is discarded and the buffer 
is not updated. 

• The packet brings a new information to the node, i.e., 
Ps^ ^ Psp. The node update its buffer and increases its 
transmit probability. The more innovative packets a node 
receives, the more its transmit probability increases. This 
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Algorithm 3 Data queries and network size estimation. 
Require: G = (V,f) and /C with |/C| = fc <c n. 
Initialize P = 0. 
for all Si G /C do 
P = Pup^.. 
end for 
Set n = |P| 


is motivated by the fact that the more new information a 
node receives, the better candidate it is to transmit. To be 
able to estimate locally the neighbours of the nodes, each 
node first remove its ID from the packet it possesses and 
then append it to the end of the packet. 

Afterward, each node Si samples from a probability distribu¬ 
tion 1) and decides, according to fs-, either to broadcast 
Psi or not. After broadcasting data, the node resets its transmit 
probability to the initial value. This is motivated by the fact 
that after transmission, if all neighbours received the packet, 
then node Si does not bring new information anymore unless 
it receives new packets. Algorithm |2] summarizes the steps of 
the knowledge distribution phase. 

C. Query Phase 

In this phase, a DC queries some nodes from the set of nodes 
to retrieve information about the current status of the network 
Af and infer its size. If the queried node Si is alive then it 
transmits its packet Ps-. Otherwise, there is no transmission, 
and the packet of that node is the empty set. 

After querying the nodes, their packets are processed using 
the union operator and by counting the number of IDs. In 
other words, the quantity Z, the counting estimation, can be 
obtained by Z = |P| where |.|i is the cardinality operator. 
Algorithm |3] summarizes the steps of the data collection and 
network size estimation phase. The next section relates the 
counting estimation to the Best Linear Unbiased Estimators 
(BLUE) of network size h. 

IV. Performance Analysis 

Let Xij{t) be a Bernoulli random variable denoting 
if node Si knows that node Sj is alive. Let Xpt) = 
(Xiip), ■■■ , XiN^^^ (t)) be the vector containing the knowl¬ 
edge of node Si. Erom Algorithmic] Ps- is the realisation of 
the random variable Xi{t) at each time slot t. 

Let Z{t) = {Zi{t),, ■ ■ ■ , Z^^^pt)) be a random variable 
where Zpt), 1 < i < iV^ax is a Bernoulli random variable 
denoting if the central unit knows that node Si is alive when 
the data collection is performed at time slot t. Let Z{t) = 

N 

^ ' ma x 

Ziit). Erom Algorithm [3] Z is the realisation of Z{t) at 

2—1 

query time t. Given the data collection equation, the random 
variable Zi{t), I < i < A^max can be written as follows: 

Zi(t) = TnaxXjiit). (1) 

Sj^K 

Define A as the set of node that are alive and T> = M \ A 
the set of nodes that are dead where M is the set of all nodes 
in the network. It can be easily seen that |A/"| = iV^ax and 
|,A| = n. 


Definition 2. Let BpspA > 1 be the t-degree neighbours 
function defined as: 

BtPP = U S,, (2) 

with Bo(si) = Si. At time t, the function Bpsp represents the 
neighbours (of the neighbours) x (t — 1) of node st. 

This section assumes that nodes have the same initial 
transmit probability / and the same erasure probability q. 
The following lemma links the estimator Z{t) given by 
Algorithm Id to the BLUE of the network size n for f = 0,1 
and t = oo: 

Lemma 1. The estimator Z(t) for f = 0,1 and t = oo is 
proportional to the BLUE of the network size h. In other 
words, it can be written as follows: 

Z(0) = iVmaxCton 
Z{1) = TVmaxaoctin 

lim Z(t) = n, (3) 

t—foo 

K N — K 

where = —— and ai = (IH--7rP^/(l - q)). 

^max LW 

Proof: The proof can be found in Appendix 0 ■ 

Erom the expressions proposed in Lemma [T] it is clear 
that when the number of queried nodes K = A^max, then 
the estimator Z{t) is equal to the BLUE of the network size. 
Such property linking the counting estimator to the BLUE is 
conjectured to be valid of all time instant t: 

Conjecture 1. The estimator Z{t) is the proportional to the 
BLUE h of the network size and can be written as: 

t 

Z{t) = A^max OLkh, (4) 

with 

oo t 

CT-k — l/A^niax and lim 0 :^ — l/Atjnax- (5) 

fc—0 fc—0 

V. Simulation Results 

This section presents the simulation results of the proposed 
counting algorithm. In all the simulations, the bound is set to 
A^max = 350 for a network containing n = 300 nodes. The 
field is set to the unit square, the connectivity radius to R = 
0.1 and the average packet erasure to Q = 0.1. Due to space 
limitations, the performance of the network size estimator is 
not presented. 

Eigure [T] shows the relation between the number of queried 
nodes and number of estimated nodes in the network at 
varied query time t and transmit probability F. We notice 
that asking 10% or more of nodes gives a good estimation 
of the network size. Besides, increasing the initial transmit 
probability F or the query time t results in an enhancement 
of the performances. 

Eigure |2] shows the initial transmit probability F versus the 
time t, in which the total estimation of network nodes is 95% 
or more for various queried nodes K. One can notice that for 
fixed F = 0.5, increasing the queried nodes from AT = 10 to 
K = 20, reduces the average time t to disseminate the node’s 
information in the network. 
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Fig. 1. Number of Queried nodes versus the number of estimated nodes for 
different combination of query time t and initial transmit probability F. The 
network contains n = 300 nodes bounded by A^max = 350. The connectivity 
radius is R = 0.1 and the average erasure Q = 0.1. 



Fig. 2. Initial transmit probability F versus the average time to perform 
95% estimation of the network for different number of queried nodes K. The 
network contains n = 300 nodes bounded by A^max = 350. The connectivity 
radius is i? = 0.1 and the average erasure Q = 0.1. 



Fig. 3. Number of queried nodes K versus the average erasure probability 
to perform 95% estimation of the network for different running times t. The 
network contains n = 300 nodes bounded by A^max = 350. The connectivity 
radius is R = 0.1. 



Fig. 4. Query time t versus the average number of queried nodes to perform 
95% estimation of the network for different initial transmit probability F. The 
network contains n = 300 nodes bounded by A^max = 350. The connectivity 
radius is R = 0.1 and the average erasure Q = 0.1. 


Figure [3] illustrates the number of queried nodes K versus 
the average erasure probability, in which the total estimation 
of network nodes is 95% or more for different running times 
t. As expected, the number of queried nodes to perform 
95% estimation of the whole network size decreases with the 
number of iteration of the algorithm. This can be explained by 
the fact that as the number of iteration increases, each node 
have more knowledge about the network conhguration that 
results in a less queried nodes. 

Figure |4] shows the relationship between the queried time t 
versus the mean number of queried nodes K to achieve 95% or 
more of the total estimation of network size. We hrst note that 
for t = 8, the counting estimator reached the BLUE. Hence, 
for our setting, f = 8 is sufficient for the condition t ^ oo. We 
also note that increasing the initial transmit probability results 
in an improvement the estimation of the network size. 

VI. Conclusion 

This work introduces a novel hybrid size estimation algo¬ 
rithm in an anonymous graph, in which each node knows only 


its unique identiher. A node counting algorithm is proposed 
whose output can be used to perform network size estimation 
using statistical inference methods. For dense graphs and 
accurate running times, the paper shows that the proposed 
algorithm produces an estimate of the total number of nodes 
proportional to the BLUE and that it converges when all the 
network nodes are queried. Simulation results show that the 
proposed algorithm produces a good estimate when either the 
running time or the number of queried nodes are reasonable. 
As a future research direction, the proposed conjecture can 
be demonstrated, and the result of the paper can be ex¬ 
tended to networks with nodes having non-unique IDs or non¬ 
maintaining hxed network topology. 

Appendix A 
Proof of Lemma[T] 

This section provides the proof of Lemma [T] The proofs 
rely on auxiliary results of Theorem [H Theorem |2] Lemma |5] 
and Lemma |6] that are available in Appendix iBl 
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A. Performance for Z(0) 

Lemma 2. The estimator Z(t) for t = 0 is the proportional 
to the BLUE h of the network size. In other words, we have: 

Z{0) = Nma^aoh, (A.l) 

where Uq = AT/iV^ax- 


P(X,,(0) = 1) = 


(A.2) 


Proof: At time t = 0, from the initialisation part of the 
packets Ps^ of a node Si G A in Algorithm [T] we have: 

1 if j = i 

0 otherwise. 

Hence, for an arbitrary node s, £ Af, we have: 

P(X,,(0) = 1) = P(X,,(0) = l|s, £ ^)P(s, £ A) 

+ P(X,,(0) = l|s, £ V)F{si £ V). (A.3) 
Since P(A:i,(0) = l|s, £ X>) = 0, then P(X,i(0) = 1) = 

77 , 

Hence, we can write: 


N„ 


Zi{0) = m.a,xXji{0) 

s-jGK. 


if Si £ /C 


otherwise. 


(A.4) 


^\Xuit) 

lo 

Therefore, we obtain: 

P(Z,(0) = 1) = P(Z.(0) = l|s, £ /C)P(s, £ /C) 

+ P(Z,(0) = l|s, ^/C)P(s, ^/C) 

= P(X,,(0) = l)P(s. £ /C) = ^ 

'^max 

= nao- (A.5) 

Using Theorem|2] the estimator Z is proportional to the BLUE 
estimate of n. ■ 


B. Performance for Z(l) 

Lemma 3. The estimator Z{t) for t = 1 is the proportional 
to the BLUE h of the network size. In other words, we have: 

Z{1) = iVmaxaoain, (A.6) 

X TV _ 

where ao = —— and ai = (IH-- nR‘^f{l - q)). 

Amax LW 

Proof: At time f = 1, from the initialisation part of the 

packets of a node Si £ ,4 in Algorithm [T] and Lemma |5] 

we have: 

{ 1 if j = i 

p^j if s, £i3i(s,)\6o(s*) (A.7) 

0 otherwise. 

For node a Sj £ Si \ Bq, node Si G A knows it is alive if 
the two following events occur: 

• Node Sj transmit its packet. This event happens with 
probability fs^. 

• The packet transmitted from Sj to Si is successfully 
received. This event happens with probability 1 — qji. 

Given that the events are independent, hence the probability 
Pij can be expressed as pij = /sj (l — qji). The probability 
that a node Si knows that a node Sj is alive can therefore be 


expressed as: 

P(Xy (0) = 1) = P(Xy (0) = l|s, £ ^)F(s, £ A) 
+ P(X,,(0) = l|s, £ P)F(s, £ P) 

= P(X,,(0) = l|s, £,4) 


N„ 


N„ 


1 

0 


if j = J 

if Sj £ Si(si) \'So(si) (A.8) 
otherwise. 


We obtain the expression of Zi(t) for f = 1 as follows: 

P(Z,(0) = 1) = P(Z,(0) = l|s, £ /C)P(s, £ K.) 

+ P(Z,(0) = l|s, ^/C)P(s, ^/C) 

nK IV — K 

■ """ -P(Z,(0) = l\s,iK.). 


N 

-''max -''max 


(A.9) 


The second term can be expressed as: 

P(Z,(0) = l|s, ^ /C) = 

^ P(Z,(0) = l|s. £ 6,(1) \ 6,(0))P(s. £ 6,(1) \ 6,(0)) 

Sj G/C 

+ P(Z,(0) = l|s.^ U 6,(l))P(s. ^ U 6,(1)). 

Sj^K^ Sj^ /C 

(A. 10) 

Note that we removed the conditioning Si K. only 
for clarity. We hrst compute P(si £ 6, (1) \ 6, (0)). From 
the connectivity condition of two nodes in the network, the 
probability can be expressed as P((i(si,s,) < R), where R is 
the connectivity radius. The nodes are uniformly distributed 
in a rectangle of width W and length L. Therefore, we have: 

ttR? 

P{d{si,Sj) < R) = (A.11) 

The term can be simplihed as 

ttR? n 


LW 


p(z,(o) = i|s, iic) = 


LWN„ 


/''.(I 




(A. 12) 

If all the node have the same erasure probability and initial 
transmit probability, the term can further be simplihed as: 

ttR? Kn 

P(Z,(0) = l|s, ilC)= /(I - q). (A.13) 

L/VV iVniax 

The probability that node Si is alive can therefore be written 
as; 

P(Z,(0) = 1) = ^(1 + - q)) 


7V2 

max 


= nao{l + 

= TlO!.()Q.i. 


LW 


LW 


-nR^fil - q)) 


(A. 14) 


C. Performance for Z{oo) 

Lemma 4. The limit of the BLUE h of the network size goes 
to Z(t) as t goes to oo. In other words, we have: 

lim Z{t) = h. (A. 15) 


Proof: To proof this lemma, we hrst compute the MLE 
estimator h of n as t —oo. From Lemma |6l we note 
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that the average number of alive neighbours of an arbitrary 
alive node is an increasing function. If we assumed that 
the whole network is connected, then the average number 
of alive neighbours of an arbitrary alive node is a strictly 
increasing function bounded by n. Therefore, 3 to such that 
V t > to, we have Bt{si) = st € A. Given that 

V{Xij{t) = 1) > 0, \/ Sj G A, t > to and that is a strictly 
increasing function bounded by 1, then 3 t* such that for 

Sj e A: 


P(X,,(t*) = 1) 


1 


n 




We can write; 


if j = i 

otherwise. 

(A.16) 


= 1) = = l|s, e ,4)P(s, e ,4) 

+ P(Z,(r) = l|s, i ^)P(s, i A). (A.17) 
Since V{Zi(t*) = Ijs^ ^ ,4) = 0, then 

P(Z,(t*) = 1) = ^^P(Z,(t*) = l|s, e A). (A.18) 

Using Theorem [T] the probability P(Zi(t*) = l|si G ,4) can 
be written as; 

P(Z,(U) = l|s, G ^) = 1 - H (A.19) 

where ppit*) = F^Xpit*) = l|si G A). Two cases can be 
distinguished; 

• Si G 1C: By substituting pu = 1, we have;P(Zi(t*) = 
l|si G ,4) = 1. 


IL 

Si ^ /C; By substituting pji = — -, 

»max 

/ \ ^ 

have;P(Zi(t*) = l|si G ^) = 1 — I — - I . For 


N, 


dense networks, we have j 
P(Z,(U) = l|s. G ^) = 1. 


^max ' 

max 


max 

K 


0, hence 


In both cases, we obtain F{Zi(t*) = 1) = 


N„ 


■ = an, V t > 


t*. Another alternative is to assume that among the K queried 
nodes, at least one of the node is alive. In that case, we directly 
obtain F{Zi{t*) = 1) = an, \/ t >t*. Using Theorem|2] the 
BLUE estimator n of n can be written as; 


n = 


i=l 




tvmax 

E 

i=l 




(A.20) 


Appendix B 
Auxiliary Results 

A. Maximum of Bernoulli Random Variables 

Theorem 1. Let Xi, ■ ■ ■ ,Xn be independent Bernoulli ran¬ 
dom variable with F{Xi = 1) = pi, 1 < i < n. The random 
variable Z = maxi<i<„ Xi is a Bernoulli random variable 

~ ~ n 

with parameter p = F{Z = 1) = 1 — 0 (1 ~ Pi)- 

i=l 

Proof: Since the only possible values of Xj, 1 < i < n 
are 0 and 1, then the support of Z is {0,1}. We can clearly 


see that; 

P(Z = 0)=P(Xi=0, ••• ,X„ = 0) (B.l) 

n n 

i—l 2—1 

Therefore, the random variable Z = maxi<i<„ Xi is a 

~ ~n 

Bernoulli random variable with parameter p = 1- n(i-Fi)- 

i=l 


B. Best Linear Unbiased Estimator of Bernoulli Random 
Variables 

Theorem 2. Let Xi, ■■■ ,X]\j be identical independent 
Bernoulli random variable with F{Xi = 1) = na, 1 < 
i < N where a is a constant that do not depend on n. The 

N 

quantity ^ Xi is proportional to the Maximum Likelihood 

2—1 

Estimator (MLE) of the quantity n. Moreover, the estimator 
n = ——- is the best linear unbiased estimator (BLUE) 

J. , Na . 
of the quantity n. 


Proof: The likelihood function of {Xi,, 
be written as; 


fxi, 


■, Xn 


(xi,, 


, Xpf) can 
xn) = Y[inaf^{l - na)^-^^ 


N 


2=1 


N 


log(/xi„ ■■■. Xn) oc ^a;4og(n) + (1 - a;i)log(l - na). 

(B.2) 


2=1 

d 


Solving the equation —log(/xi ••• = 0 yields the 

dn ’ 

following MLE: 


n = 


Na 


(B.3) 


N 


Therefore, ^ Xi is proportional to the MLE of the quantity 

2=1 

n. The mean of h can be expressed as; 

»• 

which conclude that the estimator is unbiased. The variance 
can be obtained as follows; 


Var(n) = 


N^a^ N^a^ 

n{l — na) 


Na 


(B.5) 


Computing the Fisher information yields; 


® I 1 = E 




+ E 


Na 

n 


(1 — na)2 ^ 

Nna 


1 — na 


Na 


n{l — na) 

Finally, h is the BLUE of the quantity n. 


(B.6) 
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C. Maximum Number of Reachable Nodes 

Lemma 5. The maximum number of neighbours node Si can 

know at time instant t is |S((si)|. 

Proof: The proof is a direct consequence of the data 
dissemination Algorithm |2] At each round, each node can 
transmit to all of its neighbours. Hence after t round, the 
information initialled at a node Si would have travelled at 
most inside Bt{si). Due to the symmetry of the problem, the 
farther information node Si can get is initialled inside Bt{si) 
which conclude that the maximum number of neighbours node 
Si can know at time instant f is |St(si)|. ■ 


D. Average Number of t-degree Neighbours 

Lemma 6. The average number of nodes in Bt{si) \ 
Bt-i{si),t > 1 can be approximated by: 

mTR^(2t — 1 ) 




LW 


(B.7) 


Proof: To proof this lemma, we first show that |St(si)| = 
mrR^t 

. Using the fact that Bk{s Bt{si), V fc < f, we 

can write \Bt{s^) \ Bt-i{si)\ = \Bt{si)\ - \Bt-i{si)\ which 
conclude the proof. 

n'K{tRY 

We proof that \Bt{Si)\ = — -j -^— by induction. For 
f = 1, we can clearly see from the definition of Bi^sf) that 
Sj € Si(si) if and only if d{si,Sj) < R. Since the nodes 
are uniformly distributed over [0, L][0, W] and neglecting the 

nirR^ 

side effects, the average number of nodes is Assume 

J-j y/y 

the preposition hold for t and that Sj G BpSi) if and 
only if d{si,Sj) < tR. Assume 3sj G Bt+i{si) such that 
d{sj,Si) > {t + 1)R. From the triangular inequalities of the 
distance operator, we can write for all node Sk G Bt{si): 

^ s/e) T d(^S]i^ (B.8) 

From the assumption at step t, we have d{si,Sj) < tR. 
Therefore, we obtain; 


d(sfc, Sj) > d{si, Sj) — d{si, Sk) > (t + 1)R — tR=R. 

(B.9) 

Since for nodes s and s' to be connected, we should have 
d{s,s') < R, then node Sj is not connected to any node 
Sk G Bt{si). Therefore, d{sj^Si) < (t + l)R. This last 
expression translates to the fact that the average number of 
mr(t + l)^R^ 

nodes in Bt+i{si) is --. Finally, using the fact 

that \Bt{si) \ Bt-i{si)\ = \Bt{si)\ - \Bt-i{si)\, we conclude 


that 




mrR^{2t — 1) 
LW 


(B.IO) 
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